Markov processes follow from the principle of Maximum Caliber 
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Markov models are widely used to describe processes of stochastic dynamics. Here, we show that 
Markov models are a natural consequence of the dynamical principle of Maximum Caliber. First, we 
show that when there are different possible dynamical trajectories in a time-homogeneous process, 
then the only type of process that maximizes the path entropy, for any given singlet statistics, is 
a sequence of identical, independently distributed (i.i.d.) random variables, which is the simplest 
Markov process. If the data is in the form of sequentially pairwise statistics, then maximizing the 
caliber dictates that the process is Markovian with a uniform initial distribution. Furthermore, if 
an initial non-uniform dynamical distribution is known, or multiple trajectories are conditioned on 
an initial state, then the Markov process is still the only one that maximizes the caliber. Second, 
given a model, MaxCal can be used to compute the parameters of that model. We show that this 
procedure is equivalent to the maximum-likelihood method of inference in the theory of statistics. 

PACS numbers: 



I. INTRODUCTION 

E.T. Jaynes' principle of maximum entropy (Max- 
Ent) has wide applications in engineering and science 
[l|-l3|, and serves as one description of the foundation 
for equilibrium statistical mechanics 0, recent 
years, this principle has been generalized for treating 
time-dependent statistical phenomena, and is sometimes 
called the principle of maximum caliber (MaxCal) 

la. 

Markov processes [l3|, whose transition densities are 
described by Chapman-Kolmogorov equations @, are a 
common starting point in modeling stochastic dynam- 
ics. Here we justify from a maximum entropy standpoint 
"why start with a Markov process?" 

In the present paper, we show that the application 
of MaxCal to time-homogeneous data precisely yields a 
Markov process. A Markov process model comes with 
a set of parameters, i.e., the transition probabilities. 
These are precisely related to the Lagrange multipliers 
generated by the principle of MaxCal. In determining 
these parameters, the MaxCal approach coincides with 
the method of maximum likelihood in the statistics of 
parameter estimation. 

The theory of Maximum Caliber, in other words the 
idea of maximizing path entropy subject to constraints. 
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was first tackled in the context of Markov processes by 
Filyukov and Karpov in 1967 [1], as far as the current 
authors are aware. In this work, it was assumed that a 
trajectory could be broken down as a Markov chain. Mo- 
tivated by arguments from Khinchin, Filyukov and Kar- 
pov argued that maximizing the breadth of the trajectory 
ensemble based on data provided on the Markov transi- 
tion probabilities was a recipe for inferring a trajectory 
ensemble consistent with observation. Similar reasoning 
has been used in recent work [ol-fisl [H, [l^, |18i] . 



Here we work in reverse and ask, given particular con- 
straints is it possible to infer that the Markov model is the 
one that maximizes the entropy? The answer will turn 
out to be yes as maximum entropy is well known to set 
correlations between independent probabilities (or more 
generally, independent subsets of a map) to zero unless 
data provides evidence for correlation. This derives from 
axioms used in obtaining the entropy formula, namely 
"subset independence" [l3, HH ; a property originally en- 
forced by Shannon through his "composition property" 
as a logical consistency requirement for any inference 22] . 
Hence, given independent data on transition probabili- 
ties, a trajectory becomes the product of transition prob- 
abilities. We verify the Markov process as following from 
consistency conditions for path probabilities as noted by 
a. Kolmogorovfli) and maximum entropy. 
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II. THE MAXIMUM-CALIBER APPROACH 

Suppose we have a stochastic process with a discrete 
state space S* = {1, 2, • • • , N}. 

Consider its trajectories of length T, and denote the 
probability for the trajectory {io,«i, • • • ,«t} as pigi^...ij. 
where i„ is the state visited at time n. The path entropy, 

5(r), is 



S{T) 



E 



log Pi, 



(1) 



The above, Eq.([T]), is the correct measure which, when 
maximized subject to constraints, yields the least biased 
set {pioii---iT} consistent with observation [13, El- Fur- 
thermore, if {Pi„ii - iT} is a path probability, then it must 
satisfy certain logical consistency requirements first es- 
tablished by A. Kolmogorov To]. 

Specifically, for any T equal or any positive integer, 
the probability distribution has to satisfy 



(2) 



i.e. the distribution of the process during the time inter- 
val {0, 1, 2, • • • , T} should be the marginal distribution of 
the process during the time interval {0, 1, 2, ■ • • , T -f 1}. 
We consider the three different types of constraints. 

(1) Constraint on the mean of the singlet dis- 
tribution. First, we suppose we only know the mean 
number of time intervals during which the system dwells 
in each state m. Am, in a trajectory of length T, where 



(3) 



fc=0 



where S., 



1,3 



1 when integers i = j, and otherwise, 
Sfc=o ^ik,m is the number of time intervals during which 
state m is occupied in the trajectory {ip, Ji, • • • , ir}, and 
J2m ^™ = T +1. We call this data singlet statistics. 

Finally, we maximize Eq. ([1]) subject to constraints 
on our singlet statistic imposed as Lagrange multipliers. 
That is, we maximize the Caliber, S{T) — J2m ^mAmiT) 
with respect to Pioij. 



We conclude that 



n- 

k=0 



(4) 



Thus including normalization the probability now be- 
comes 



Pioii---iT ^ 



n 



k=0 ^ 



Sioii ■■■Jt 11^=0 ^ 



--WpiAT), (5) 



By definition 

T 

Am^ E Pioil-iT E "^^fc^™ = + l)Pm(T). (6) 



fc=0 



As a final note on this example, we consider the re- 
sult of imposing the consistency condition, Eq. [21 to the 
distribution obtained above. From this we obtain 



E Pi0-lTiT + l = n + 1)' c^) 

but since ^'»o■■■^T^T+l =^•^o^l■■■^T = ^Lo^'^J'7')> i* 

then follows that 

T T 
X{p^AT+l) = X{p^AT), (8) 



fc=0 



fc=0 



which gives 



p,,(r)=p,,(T + i) 



(9) 



for any i^. The proof is trivial if we were now to change 
It to another state ij.. This would give us 



T-l 



T-l 



n p.. {T+l)- p.^ (T + 1) = n (T) ■ p,^ (T) (10) 



fc=0 



fc=0 



which, when compared with Eq. ([8|), yields 

PiriT+l) PtriT)' 

Then since both the summation of all pi (T -I- 1 ) and Pi{T) 
are equal to one, hence Eq. ([9]) follows. 

Under such constraints, MaxCal thus yields an identi- 
cal, independent distributed (i.i.d.) process. Eq. ^ is 
the statement of independence and Eq. ^ is the state- 
ment of the identical distributions. 

(2) Constraint on pairwise statistics. Now we 
consider instead a situation in which the constraint, in- 
stead, is on the pairwise statistics for each step m — > n 
in the time period [0,T], i.e. 



Am.n ^ ^ Pio---iT ^ ^ 



(11) 



k=0 



Here 'Yiik=Q ^ik,mSi^_^,i,n is just the number of occurrence 
of the transition m ^ n, and J2mn — T. 

Then maximizing Eq. ([T]) subject to constraints on the 
pairwise statistic imposed using Lagrange multipliers, we 
conclude that 



T-l 



oc e 



Y[Plk1k + li^'^) 



k=0 
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where = e~ 'fc'fc+i. 

Next, applying the consistency condition, Eq. ©, we 
find by a method that is similar to the previous case (i.e. 
only replacing ix with ij, and considering its difference 
from the original form) that 



(13) 



That is, under constraints on transition, the only stochas- 
tic process consistent with maximum caliber is the 
Markovian process with uniform initial distribution. The 
Markovian property is directly seen from the last propor- 
tionality in Eq. (IT^ . We use the equation above to jus- 
tify that transition probabilities are independent in the 
way that we used a similar expression to argue indepen- 
dence of the i.i.d. process when we had singlet statistics. 

(3) What if the initial distribution is not uni- 
formly distributed? If the initial distribution is not 
uniformly distributed, then the MaxCal method can still 
be used, but with an entropy conditioned on an initial 
state as follows 

S{T\iQ) = - ^ Ki...jr|jo logPn--iT|io' (14) 



tl,--- ,IT 



with constraints 



(number of transitions m — > ri|io) = 



E 



T-1 



fc=0 



(15) 



The unconditioned trajectory probabilities are simply 
a reweighted sum over the conditioned ones. By select- 
ing the trajectories with the same initial states from an 
unconditioned ensemble of trajectories, one reproduces 
the distribution of the conditioned trajectory ensemble. 
Conversely, by recombining the trajectory with different 
initial states, one can obtain the distribution for the un- 
conditioned trajectory ensemble. 

To be precise, if we had no information regarding the 
distribution of initial conditions but if we had precise in- 
formation regarding a particular initial state for a specific 
measurement, then we could only apply the conditional 
likelihood function 



T-l 



(T) 



with constraints '^jPij = 1 for each j. 

Then applying the similar Langrange multiplier 
method like maximizing the Caliber, we set H = L — 
J2i ^iiJ2jPij ~ 1)' where H is the likelihood subject to 
a constraint on normalization, and 



dH 

dpij 



L{{P^,}) 
Pij 



Az = 0, 



followed 



Pi J (X 



A^J{T) 



Then applying the Lagrange multiplier method, we 
would also conclude that 



Pi\ ■ ■ -i 



T-l 

Wp 

k=0 



'^k'^k + l • 



Finally, we can easily derive a relation between Amm a 
property of the data, and Xmm the Lagrange multiplier. 
To do so, we define the dynamical partition function 



Q<i{T)= J2 



(16) 



Taking the derivative of this dynamical partition func- 
tion gives the average number of time intervals of dwell, 
A-ip — — d log Qd{T)/d log Xip (resembhng the way that 
taking derivatives of equilibrium partition functions yield 
equilibrium averages and higher cumulants). When T is 
large enough, we have ^iAip{T)/T 
the stationary distribution of state p. 



TTp where TTp is 



III. DATA ANALYSIS PROCEDURE 



We have proved that the MaxCal approach gives a 
Markov chain. But, in practice, the constraints Aij (T) 
should be replaced by the sample estimators Aij (T) . 



I.e. 



Pij 



E,4(r)' 



Knowing an initial distribution pi{0), simply forces us 
to add some multiplicative constants in the likelihood 
function. The same holds for multiple trajectories, ex- 
cept that we must group together the trajectories with 
different initial states. 

Thus, maximizing the conditional caliber in dynamical 
modeling is equivalent to maximizing the likelihood. 



IV. CONCLUSIONS AND DISCUSSION 

Maximum caliber can be used both as a first principle 
from which to derive stochastic dynamical models and 
also as a data analysis method. We showed here that 
Markovian dynamics follows from MaxCal; the Markov 
property is a natural consequence of maximizing a dy- 
namical entropy over trajectories for time-homogeneous 
processes. Our treatment can be generalized to handle 
dynamical systems having finite memory (time delays), 
corresponding to similar situations in inference for time- 
series analysis [24|]. 
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